From Predicting Predominant Senses to Local Context for Word Sense Disambiguation

نویسندگان

  • Rob Koeling
  • Diana McCarthy
چکیده

Recent work on automatically predicting the predominant sense of a word has proven to be promising (McCarthy et al., 2004). It can be applied (as a first sense heuristic) to Word Sense Disambiguation (WSD) tasks, without needing expensive hand-annotated data sets. Due to the big skew in the sense distribution of many words (Yarowsky and Florian, 2002), the First Sense heuristic for WSD is often hard to beat. However, the local context of an ambiguous word can give important clues to which of its senses was intended. The sense ranking method proposed by (McCarthy et al., 2004) uses a distributional similarity thesaurus. The k nearest neighbours in the thesaurus are used to establish the predominant sense of a word. In this paper we report on a first investigation on how to use the grammatical relations the target word is involved with, in order to select a subset of the neighbours from the automatically created thesaurus, to take the local context into account. This unsupervised method is quantitatively evaluated on SemCor. We found a slight improvement in precision over using the predicted first sense. Finally, we discuss strengths and weaknesses of the method and suggest ways to improve the results in the future.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Predominant Word Senses in Untagged Text

In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of handtagged data. Whilst there are a few hand-ta...

متن کامل

Using automatically acquired predominant senses for Word Sense Disambiguation

In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The first (or predominant) sense heuristic assumes the availability of handtagged data. Whilst there are hand-tagged corpora available for some languages, these are relatively small in size and many word forms either do not o...

متن کامل

An Eigenvalue-Based Measure for Word-Sense Disambiguation

Current approaches for word-sense disambiguation (WSD) try to relate the senses of the target words by optimizing a score for each sense in the context of all other words’ senses. However, by scoring each sense separately, they often fail to optimize the relations between the resulting senses. We address this problem by proposing a HITS-inspired method that attempts to optimize the score for th...

متن کامل

Text Categorization for Improved Priors of Word Meaning

Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back off to the predominant (most frequent) sense of a word when contextual clues are not strong enough. The topic domain of a document has a strong influence on the sense distribution of words. Unfortunately, it is not feasible to produce large manually sense-an...

متن کامل

A Practical Solution to the Problem of Automatic Word Sense Induction

Recent studies in word sense induction are based on clustering global co-occurrence vectors, i.e. vectors that reflect the overall behavior of a word in a corpus. If a word is semantically ambiguous, this means that these vectors are mixtures of all its senses. Inducing a word’s senses therefore involves the difficult problem of recovering the sense vectors from the mixtures. In this paper we a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008